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In this note we give a proof for the result stated as Theorem 4 in [T]. 

A collector samples with replacement a set of n G := {1, 2, . . .} distinct coupons so that 
the draws are independent and at each time any one of the n coupons is drawn with the same 
^ . probability 1/n. For an integer m„ G {0, 1, . . . , n — 1} that depends on n, sampling is repeated 
1^ ' until the first time Wn,m„ that the collector has collected n — nin distinct coupons. Baum and 



O 

m 



> 



Billingsley proved in [2] (using the method of characteristic functions) that if 

rrin — >■ oo and — >• v2A for some A > constant, as n — cxd, (1) 
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then Wn,m„ — {n — rUn) converges in distribution to the Poisson law with mean A. 
Throughout all asymptotic relations are meant as n — cx). ^ 

It can be seen that the following equality in distribution holds for Wn,m„ '■= W^n,m„ — {n — rrin): 
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where the X„j random variables are independent, and Xn,i + 1 has geometric distributions with 
success probability i/n, i G {m„ + 1, . . . ,n}, n & N, that is P{Xn,i + I = j} = (l — ^, 



(N| j e N, i e {mn + l,...,n}. 



We approximate the waiting time Wn,m„ with a Poisson random variable that has mean 
A„ = J2i=mn+i ~ n)- Using the special combinatorial structure of the problem we derive the 
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O first asymptotic correction of the P{Wn,mn = k), k = 0,1, . . ., probabilities to the corresponding 

^ Poisson point probabilities. We note that in principal the method presented in the proof can be 

L" ' extended to determine higher order terms in the asymptotic expansion. 

^ . 

^ ■ Theorem. If {m„}„gAr is a sequence of nonnegative integers that satisfies ([T]), 
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P{Wn,m„ = 0) = e-^" - e-^"^ + O 
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PiWn,n.„ = 1) = e-^"A„ - e-^"A„^ + 



k\ \{k - 2)! k\J 2 \n 
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We note that A„,2 = ^^^^ + Indeed 
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{n - mn){n - m„ - l)(n - m„ - \) 
(2A„)3/2 _^ - mn){n - m„ - 1) 



n — m, 



n{n + 1)(2?2 + 1) mn{mn + l)(2m„ + 1) 



- ^ - A/(r2 - mn)(n - rrin - 1) ^ 



where we used the fact that A„ = (" ™n i) ^ ^^^^ ^j^^ second term in the formula above is 

We shall need the following simple result for the proof of the theorem. 
Proposition. If {m„}„gAr is a sequence of integers that satisfies ([I]), then 

An = A„,i := ^ I 1 - - 1 ^ A, and 

i=mn+l 
n , 



i=m„+l 



- < K{ — - , and Xn,j 0, j = 2, 3, 
72/ \ n 



(2) 
(3) 



Proof. (12]) is true, because 
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by (P). By taking the square root of both sides of the equality above it can be deduced that 

n — m„ — 1 

^ < V^- 
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(4) 



Now we prove the first assertion of ([3]) by induction. For an arbitrary j = 2, 3, ... we bound A„j- 
as follows: 
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Since for j = 2 this gives Xn,2 < ^ny^ by (jlj), we have the first part of ([3]) in this case. If we 
have the same result for some j > 2, then it holds true for j ' + 1 as well by the argument above, 
(jll) and the inductional hypothesis. Since A^ — > A by ([2]), the second part of ([3]) follows from the 
first. □ 

Proof of the Theorem. We are going to represent each possible outcome of the collector's 
sampling with a sequence of integers the following way: let us suppose that while sampling (with 
replacement), the collector labels the distinct coupons he draws form 1 to n — rrin in the order he 
obtains them in the course of time, and after each draw he writes down the label of the coupon 
just drawn. So he begins the enumeration of labels with a 1 after the first draw, and each number 
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that he writes to the end of his hst after a draw is either the label already on the coupon he just 
got (if he had drawn the same one before), or it is the label he gives the coupon at that moment, 
which would be the smallest positive integer he has not yet used in the process of sampling and 
labeling. In the first case we call the new member of the sequence "superfluous", while in the 
second case we call it a "first appearance". ^ 

We fix an arbitrary k E N, and we suppose that n so big that n — rrin > k holds. Now Wn,m^ = 
k means that the collector had k "superfluous" draws, thus the corresponding representing 
sequence contains n — m„ "first appearances" and k "superfluous" members. We categorize all 
such outcomes according to how the k "superfluous" draws are split into blocks by the n — rrin 
"first appearances" in the representing sequences: to each vector k = {km„+i, km„+2, • • • , kn-i), 
where ki G Z+, i = rrin + 1, . . . , n — 1, and Y^^Zm^+i ^* ~ ^' correspond the sequences where there 
are kn-i "superfluous" members between the 1st and 2nd "first appearances", fc„_2 "superfluous" 
members between the 2nd and 3rd "first appearances", and so on, fcm„+i "superfluous" members 
between the (n — m„ — l)th and {n — m„)th "first appearances". (This is the same as saying 
that Xni = ki, for all i = rrin + I, ■ ■ ■ ,n.) The probability of getting such a sequence is 
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n-l / ■ \ ki 
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It follows that 
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p(»n,..=*)= n ^ E n . (5) 
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Now we are going to examine the sum in above, which we denote by Sn,m„,k = S^- For 
/c = it is an empty sum, and thus it equals 1 by definition. Now let us suppose that k > 2, we 
are going to return to the cases k = and 1 later on. For an arbitrary such k we see that 

Ik = uf^^/fc,i, where Ik,i = {k E h '■ k has exactly I nonzero components}, Z = 1, . . . , /c, 

and we correspondingly define Sk,i to be the part of Sk that contains the summands over k G Ik,i, 
thus we have 

n-l / ■ \ ki k n-l , ■ \ ki k 

n i^-'j) =EE n (14) =E«M. (6) 

fce/fc i=m„+l ^ ^ 1=1 ki^Ik,i i=m„+l ^ ^ 1=1 

To determine the limit of Sk we examine the asymptotic behavior of the Sk,i expressions 
separately. We fix an arbitrary I = 1, . . . , k, and with \A\ denoting the cardinality of an arbitrary 
set A, we now calculate \Ik,i\- We can think of the vectors in Ik as the results of distributing k 
1-s in n — rUn — 1 spaces in all possible ways: to each of these distributions correspond a vector 
in Ik whose ith component is the number of 1-s put in the ith space, i = rrin + 1, . . . , n. To 
produce a vector in Ik,i we first choose / different spaces, and we put a 1 in each of them, then 
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we distribute the remaining k — I 1-s in these previously chosen / spaces that aheady have a 1, 
but this time any such space can be chosen more than once. This gives 



'A:,/ 



n — rrin — 1\ fk — 1 
I )\k-l 



l = l,...,k. 



We obviously bound Sk,i from above if we replace each of the factors in its products by the 
largest one of them, namely by 1 — "^"^'^ ■ This together with the just calculated formula gives 

^^^^^(n-m -l^k-A 1 ^ 



/ J \ k — I J \ n j \ \/n j \ \fn 

Hence by (|4]) we have 

'5m<^|^V^'^'(^)' ' and 5^5,,<A:!min{l,(2A„)'=}(^-^y' (7) 

for any /' G {1, . . . , k^. We see from the first inequality that 5"^,^ goes to for / = 1, . . . , /c — 1, 
but it gives a constant upper bound for / = k. We are going to examine the latter case more 
carefully. Notice that the components of a vector in 7^^^ are all 0-s and 1-s, thus for any e 1^^^ 
1. Using this and the decomposition of the index set 1^ = Uf^^/fc,; we obtain 
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fce/fe i=m„+l ^ 

k^n+l-km„+2^- ■ ■ ■ kn~l^- . n n 

1=1 k^Ii^^i J=m„+1 ^ 

The first term of 5*^,^ is equal to ^ [S"=m„+i (l ~ ^)] ^ by the polynomial theorem, thus we have 

^>''''= ij u A — I— T n ( ^ - - ) • 

It follows that lim„_^oo Sk^k = because we have (Ej), and the sum above can be bounded by 
Yl^=i ^k,h which goes to by ([7]). Thus putting together our results for the expressions 5*^^; in 
([H]), we conclude that the part of Sk that counts - in the sense that it asymptotically contributes 
a positive constant to Sk is Sk,k, which is the part of the sum in the defining formula of 5"^ 
that corresponds to the 0-1 vectors of the Ik index set. 

If we write (|8]) into ([6]), we obtain the following formula for 5"^: 



where 
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Our aim is to determine the first order term of tlie error wlien we approximate 5*^ by Since 
Rk,i < Sk^i for eacli / = 1, . . . , k — 1, and for the latter expressions we have the bounds of ([7]), we 

see that J2^=i Rk,i = O (^"^j^ ^^^d the same, but more detailed argument also gives 



k-2 



k~2 



J2Rk,i <J2Sk,i < A;!min{l,(2A„)^'}^ 
1=1 

Thus the leading term of the error 
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is of order and it comes from the term Rk k_i. 
Before examining Rk,k-i we introduce some notations for further use. As an analogue of the 
set Ik^i we define Ik-2,1 to be the set of vectors k G such that ^"jjj ki = k — 2 and 

k has exactly / nonzero components, / = 1, . . . , k — 2. Also, as an analogue of the expressions 
Sk,i and Sk we define Sk-2,1 and Sk-2 by the formulas in ([6]) with k replaced by A; — 2. Finally 
we introduce 

^k^2,k-2 = {k^ Ik-2,k-2 ■■ kj = 0}, j = m„ + 1, . . . , n. 

We now return to Rk^k-i- The corresponding index set Ik,k-i contains vectors that have 
exactly one component equal to 2, k — 2 components equal to 1, and the rest 0. Thus we have 
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We can write Rk,k-i in another form, if we first sum according to the component of the vectors 
in Ik,k-i which equals 2: 
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We recognize Sk-2,k~2 in the first sum in the brackets, thus we can replace it by the formula in 
(IHl) with k — 2 in the place of k. As for the second sum in the brackets, we see that kj = 1, so 
there is a 1 — - factor in each of the products, which we can bring before the brackets. These 
considerations lead to 
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Now we bound the last two expressions. First, 
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by ([3]) and the second inequahty in (JTj) with k replaced by k — 2. Next, 
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by (H]), (IH]) and the first inequality in ([71) with k replaced by /c — 2 and I = k — 2. 
We conclude that if we write ffTTl) into (ED, we obtain 
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where i?i + i?^ ^..^ + ^fj' 7?^,; = O (i) by ([12]), ([T3l), (1101) and the fact that A„ ^ A by ([2]). 
Thus 
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Now we return to ([Hj), and approximate the product nr=m„+i n e~^". Using the 

definition of A^ in and the expansion formula of the logarithm function the error of the 
approximation can be written in the form 
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where the expressions A„j- are defined as in ([3l). Thus we have 
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where 



Rn = e" 



and we are going to show that Rn = O (^). 

We are going to bound the sum in the exponent in Since A„ — A by ([2]), there exists a 



threshold number hq such that for all n > uq we have w ^ < i. This with inequality ([H|) yields 
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for all n > no. Let us suppose that n satisfies this condition from now on. 

Now we bound |-R„|. First we apply the triangle inequality, then the inequality |e~^ — < 
^ valid for all positive real x with x = ^°l2 j'^nj , and finally use inequality (|TU|) with jo = 2 
and 3. Thus we obtain 




Recalling ( ITSl) we see that we proved 

i=m„+l i=mn+l ^ ' ^ ' 

Finally, recalling ([5]) we have 

\j=m„+l / \ j=m„+l / 

for = 0, 

P(W^„.™„ = 1) = ( n ^)A„ = e-^"A„- (e-^-- H ^ ) 

\j=m„+l / \ i=m„+l / 

for A; = 1, and 

\j=m„+l / \ i=m,„+l / 

for k > 2. We obtain the first assertion of the Theorem if we write (IT^ and (ITTl) into these 
expressions. The second assertion follows from the first and ([2]). □ 
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